Search CORE

2 research outputs found

DF-TransFusion: Multimodal Deepfake Detection via Lip-Audio Cross-Attention and Facial Self-Attention

Author: Bera Aniket
Kharel Aaditya
Paranjape Manas
Publication venue
Publication date: 12/09/2023
Field of study

With the rise in manipulated media, deepfake detection has become an imperative task for preserving the authenticity of digital content. In this paper, we present a novel multi-modal audio-video framework designed to concurrently process audio and video inputs for deepfake detection tasks. Our model capitalizes on lip synchronization with input audio through a cross-attention mechanism while extracting visual cues via a fine-tuned VGG-16 network. Subsequently, a transformer encoder network is employed to perform facial self-attention. We conduct multiple ablation studies highlighting different strengths of our approach. Our multi-modal methodology outperforms state-of-the-art multi-modal deepfake detection techniques in terms of F-1 and per-video AUC scores

arXiv.org e-Print Archive

Automatic Numerical Methods for Enhancement of Blurred Text-Images via Optimization and Nonlinear Diffusion

Author: Kharel Aaditya
Publication venue: The Aquila Digital Community
Publication date: 01/05/2020
Field of study

In this paper, we propose an automatic numerical method for solving a nonlinear partialdifferential- equation (PDE) based image-processing model. The Perona-Malik diffusion equation (PME) accounts for both forward and backward diffusion regimes so as to perform simultaneous denoising and deblurring depending on the value of the gradient. One of the limitations of this equation is that a large value of the gradient for backward diffusion can lead to singularity formation or staircasing. Guidotti-Kim-Lambers (GKL) came up with a bound for backward diffusion to prevent staircasing, where the backward diffusion is only limited to a specific range beyond which backward diffusion is stopped and forward diffusion begins. Our model combines the PME model and GKL model for automatic sharpening of blurred text-images using Nelder-Mead optimization, a derivative free optimization method that uses n+1 test points arranged as a simplex for n-dimensional optimization. We solve our model by discretizing the PDE in space using finite difference approximation scheme. Then, we enhance the image in each iteration using Backward Euler time-stepping and Minimum Residual Method (MINRES) in MATLAB. Likewise, we propose a gradientbased sharpness metric for our text-images, which also serves as an objective function for our Nelder-Mead optimizer. Our result shows that our proposed model is accurate in enhancing text images and predicting the unknown value of the blurring kernel for automatic sharpening. Numerical results show that the proposed objective sharpness measure coincide with the subjective sharpness of the enhanced image

Aquila Digital Community